Slipstream Execution Mode for CMP-Based Multiprocessors
نویسندگان
چکیده
Scalability of applications on distributed shared-memory (DSM) multiprocessors is limited by communication overheads. At some point, using more processors to increase parallelism yields diminishing returns or even degrades performance. When increasing concurrency is futile, we propose an additional mode of execution, called slipstream mode, that instead enlists extra processors to assist parallel tasks by reducing perceived overheads. We consider DSM multiprocessors built from dual-processor chip multiprocessor (CMP) nodes with shared L2 cache. A task is allocated on one processor of each CMP node. The other processor of each node executes a reduced version of the same task. The reduced version skips shared-memory stores and synchronization, running ahead of the true task. Even with the skipped operations, the reduced task makes accurate forward progress and generates an accurate reference stream, because branches and addresses depend primarily on private data. Slipstream execution mode yields two benefits. First, the reduced task prefetches data on behalf of the true task. Second, reduced tasks provide a detailed picture of future reference behavior, enabling a number of optimizations aimed at accelerating coherence events, e.g., self-invalidation. For multiprocessor systems with up to 16 CMP nodes, slipstream mode outperforms running one or two conventional tasks per CMP in 7 out of 9 parallel scientific benchmarks. Slipstream mode is 12-19% faster with prefetching only and up to 29% faster with self-invalidation enabled.
منابع مشابه
Extending OpenMP to Support Slipstream Execution Mode
OpenMP has emerged as a widely accepted standard for writing shared memory programs. Hardware-specific extensions such as data placement are usually needed to improve the scalability of applications based on this standard. This paper investigates the implementation of an OpenMP compiler that supports slipstream execution mode, a new optimization mechanism for CMP-based distributed shared memory...
متن کاملImplementation of dynamic synchronization for slipstream multiprocessor
SIVAGNANAM, SUBHASHINI Implementation of dynamic synchronization for Slipstream Multiprocessors (Under the direction of Dr. Gregory T. Byrd) The main goal of parallelization is speed up. As the number of processors increases, there is a little or no speedup, since a performance threshold is reached for a fixed problem size. This is because scalability for a parallel program is limited by the co...
متن کاملEmbedding a superscalar processor onto a chip multiprocessor
Chip multiprocessors (CMPs) aim to develop both instruction-level and thread-level parallelisms to boost a system’s performance. However, according to previous research results, CMPs outperform superscalar processors only in floating-point applications. Therefore, we have proposed a novel microprocessor, supporting two execution modes, to allow users to manually choose an appropriate mode to ex...
متن کاملTLS Chip Multiprocessors: Micro-Architectural Mechanisms for Fast Tasking with Out-of-Order Spawn
Chip Multiprocessors (CMP) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since ...
متن کاملOptimization Strategy of Parallel Query Processing Based on Multi-core Architecture
Chip Multi-Processor (CMP) could support more than two threads to execute simultaneously, and some executing units are owned by each core. Because threads share various resources of CMP, such as L2-Cache, among many threads, CMP system is inherently different from multiprocessors system and, CMP is also different from simultaneously multithreading (SMT). In this paper a novel and complete appro...
متن کامل